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© Method and apparatus for indexing files on a computer system. 

© A method and apparatus for automatically index- 
ing and retrieving files in large computer file sys- 
tems is provided. Keywords are automatically ex- 
tracted from files to be indexed and used as the 
entries in an index file. Each file having one of the 
index entries as a keyword is associated in the index 
with that keyword. If a file is to be retrieved, and its 
content but not it name or location is known, its 
keywords are entered and its identifying information 
will be displayed (along with that of other files having 
that keyword), facilitating its retrieval. 



CO 
CO 



CL 
LU 



Xerox Copy Centre 



10/27/2006, EAST Version: 2.1.0.14 



7 



EP 0 364 180 A2 



8 



keyword, one can tell the order in which each file 
became associated with that keyword. The 
"dbAddlndex" routine then ends at step 25. 

FIG. 3 is a flow diagram of the n findManRIe(s)" 
routine of Appendix C which is exemplary of rou- 
tines that can be used to retrieve files that have 
been indexed. At step 30, the index or indices to 
be searched are specified by the user or other 
program calling the routine. At step 31, the routine 
checks to see if all indices to be searched have 
been searched. If so, the routine ends at step 32. 
Otherwise, the next index is searched at step 33. If 
at step 34 the search text is found as a keyword in 
the index being searched, then the file or files with 
which it is associated are added to a reference list 
along with all identifying information and weights. 
The routine then returns to step 31 to see if any 
more indices need to be searched. Once the rou- 
tine has ended at step 32,. the reference list is 
presented to the user or other program that called 
the "findManFile" routine. 



Hardware System 

While the present invention may advantageous- 
ly be implemented on nearly any conventional 
computer system, an exemplary hardware system 
400 on which the present invention is implemented 
is shown in FIG. 4. 

FIG. 4 shows a preferred embodiment of a 
hardware system 400 implementing the present 
invention as part of a computer system. In FIG. 4, 
system 400 includes CPU 401. main memory 402. 
video memory 403. a keyboard 404 for user input, 
printer 405, and mass storage 406 which may 
include both fixed and removable media using any 
one or more of magnetic, optical or magneto-op- 
tical storage technology or any other available 
mass storage technology and in which the files to 
be indexed and searched are stored (the files can 
be entered from keyboard 404 or directly into mass 
storage 406 on removable media; if system 400 is 
part of a network of computer systems, the file 
system might include all or part of the mass stor- 
age available on other systems on the network). 
These components .are interconnected via conven- 
tional bidirectional system bus 407. Bus 407 con- 
tains 32 address lines for addressing any portion of 
memory 402 and 403. System bus 407 also in- 
cludes a 32 bit data bus for transferring data be- 
tween and among CPU 401, main memory 402. 
video memory 403, and mass storage 406; In the 
preferred embodiment of system 400, CPU 401 is 
a Motorola 68030 32-bit microprocessor, but any 
other suitable microprocessor or microcomputer 
may alternatively be used. Detailed information 
about the 68030 microprocessor, in particular con- 



cerning its Instruction set, bus structure, and con- 
trol lines, Is available from MC68030 User's Man- 
ual, published by Motorola Inc.. of Phoenix. Ari- 
zona. 

s Main memory 402 of system 400 comprises 
eight megabytes of conventional dynamic random 
access memory, although more or less memory 
may suitably be used. Video memory 403 com- 
prises 256K bytes of conventional dual-ported vid- 

70 eo random access memory. ' Again, depending on 
the resolution desired, more or less such memory 
may be used. Connected to a port of video mem- 
ory 403 is video multiplex and shifter circuitry 408, 
to which in turn is connected video amplifier 409. 

is Video amplifier 409 drives cathode-ray tube (CRT) 
raster monitor 410. Video multiplex and shifter cir- 
cuitry 408 and video amplifier 409, which are con- 
ventional, convert pixel data stored in video. mem- 
ory 403 to raster signals suitable for use by moni- 

20 tor 410. Monitor 410 is of a type suitable for 
displaying graphic images having a resolution of 
1120 pixels wide by 832 pixels high. 

The reference lists produced by the invention 
can be stored in mass storage 406, displayed to 

25 the user on monitor 410, or printed out on printer 
405. 

Thus it is seen that a method and apparatus for 
automatic indexing of files in a computer system 
are provided. One skilled in the art will appreciate 
30 that the present invention can be practiced by 
other than the described embodiments, which are 
presented for purposes of illustration and not of 
limitation, and the present invention is limited only 
by the claims which follow. 

35 

Claims 

1. For use in a computer system having a 
40 mass storage file system, apparatus for automatic 
indexing of files stored in said file system, said 
apparatus comprising: 

means for maintaining at least one index file on 
said file system, said at least one index file con- 
45 taining a date of most recent indexing and, for each 
of at least some of said stored files, information 
regarding a date of last updating for that file, and 
an association of that file with at least one keyword; 
and 

so means for automatically updating said at least one 
index file when a new file is added to said file 
system and when an existing file is updated, said 
automatic updating means comprising: 
means for, for each file, comparing said date of 

55 most recent indexing and said date of last updating 
for that file, 

means for determining if said date of last updating 
of that file is later than said date of most recent 
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indexing, and 

means for updating said keyword association in- 
formation upon a determination by said determin- 
ing means that said date of last updating of that file 
is iater than said date of most recent indexing. 5 

2. The apparatus of claim 1 wherein: 

each of said stored files has associated therewith a 
file type; and 

said automatic updating means further comprises: 
means for, for each file, examining said file type, to 
and 

means for, based on said file type, extracting from 
said file keywords and information concerning the 
relative occurrences of said keywords. 

3. The apparatus of claim 1 further comprising is 
means for retrieving said files based on said 
keyword association information. 

4. For use in a computer system having a 
mass storage file system, a method for automatic 
indexing of files stored in said file system, said 20 
method comprising: 

maintaining at least one index file on said file 
system, said at least one index file containing a 
date of most recent indexing and, for each of at 
least some of said stored files, information regard- 25 
ing a date of last updating for that file, and an 
association of that file with at least one keyword; 
and 

automatically updating said at least one index file 
when a new file is added to said file system and 30 
when an existing file is updated, said automatic 
updating step comprising: 

for each file, comparing said date of most recent 
indexing and said date of last updating for that file, 
determining if said date of last updating of that file 35 
is later than said date of most recent indexing, and 
updating said keyword association information 
upon a determination that said date of last updating 
of that file is later than said date of most recent 
indexing. 40 

5. The method of claim 4 wherein: 

each of said stored files has associated therewith a 
file type; and 

said automatic updating step further comprises: 
for each file, examining said file type, and 45 
based on said file type, extracting from said file 
keywords and information concerning the relative 
occurrences of said keywords. 

6. The method of claim 4 further comprising 
retrieving said files based on said keyword associ- so 
ation information. 
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